视觉变换器将每个图像分成具有固定长度的令牌序列,并以与自然语言处理中的单词相同的方式处理令牌。更多令牌通​​常会导致更好的性能,但计算成本显着增加。通过谚语“一张图片胜过千言万语”,我们的目标是通过制造长图像短而加速VIT模型。为此,我们提出了一种新颖的方法在推论期间自适应地分配令牌长度。具体而言,我们首先培养一种含有可调整化 - vit(Revit)的Vit模型,可以处理任何具有不同令牌长度的给定输入。然后,我们从Revit检索“令牌长度标签”,并使用它培训轻量级令牌长度分配(TLA)。令牌长度标签是最小的令牌,以分割Revit可以使REVIT可以进行正确的预测,并且学习TLA以基于这些标签分配最佳令牌长度。 TLA使REVIT能够在推理期间使用最小足够数量的令牌处理图像。因此,通过减少VIT模型中的令牌数字来提高推广速度。我们的方法是一般的,与现代视觉变压器架构兼容,可以显着减少计算扩展。我们在两个任务中验证了我们对多个代表性VIT模型(DEIT,LV-VIT和TIMESFRER)的效果(图像分类和动作识别)。
translated by 谷歌翻译
这项工作调查了神经架构搜索中的批量标准化(NAS)。具体来说,Frankle等人。发现培训Batchnorm只能实现非竞争性能。此外,陈等人。声称培训Batchnorm只能加快10次单次NAS超网关的培训。批判性地,没有努力理解1)为什么训练Batchnorm只能找到具有减少的超空网训练时间的表演井架构,而且2)列车-BN的超网和标准列车超空网之间有什么区别。我们首先显示列车-BN网络融合到神经切线内核制度,从理论上获得与所有参数的所有参数相同的训练动态。我们的证据支持索赔仅在超培训时间上训练Batchnorm。然后,我们经验披露了培训-BN的超标网络在其他运营商的卷曲中提供了优势,导致架构之间的不公平竞争。这是因为只有卷积运算符被附加到Batchnorm。通过实验,我们表明这种不公平性使得搜索算法容易选择具有卷积的模型。为了解决这个问题,我们通过在每个操作员上放置批处理层来引入搜索空间的公平性。然而,我们观察到Chen等人的性能预测因子。在新的搜索空间上不可应用。为此,我们提出了一种新颖的综合性能指标,从三个视角评估网络:源自Batchnorm的理论属性的表达性,培训和不确定性。我们展示了我们对多NAS基准的方法(NAS-BENCH101,NAS-BENCH-201)和搜索空间(飞镖搜索空间和MOBILENET搜索空间)的有效性。
translated by 谷歌翻译
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
translated by 谷歌翻译
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.
translated by 谷歌翻译
A computational graph in a deep neural network (DNN) denotes a specific data flow diagram (DFD) composed of many tensors and operators. Existing toolkits for visualizing computational graphs are not applicable when the structure is highly complicated and large-scale (e.g., BERT [1]). To address this problem, we propose leveraging a suite of visual simplification techniques, including a cycle-removing method, a module-based edge-pruning algorithm, and an isomorphic subgraph stacking strategy. We design and implement an interactive visualization system that is suitable for computational graphs with up to 10 thousand elements. Experimental results and usage scenarios demonstrate that our tool reduces 60% elements on average and hence enhances the performance for recognizing and diagnosing DNN models. Our contributions are integrated into an open-source DNN visualization toolkit, namely, MindInsight [2].
translated by 谷歌翻译
Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions.
translated by 谷歌翻译
Conversational text-to-SQL is designed to translate multi-turn natural language questions into their corresponding SQL queries. Most state-of-the-art conversational text- to-SQL methods are incompatible with generative pre-trained language models (PLMs), such as T5. In this paper, we present a two-stage unified MultI-task Generation frAmework (MIGA) that leverages PLMs' ability to tackle conversational text-to-SQL. In the pre-training stage, MIGA first decomposes the main task into several related sub-tasks and then unifies them into the same sequence-to-sequence (Seq2Seq) paradigm with task-specific natural language prompts to boost the main task from multi-task training. Later in the fine-tuning stage, we propose four SQL perturbations to alleviate the error propagation problem. MIGA tends to achieve state-of-the-art performance on two benchmarks (SparC and CoSQL). We also provide extensive analyses and discussions to shed light on some new perspectives for conversational text-to-SQL.
translated by 谷歌翻译
Deep convolutional neural networks (CNNs) have been widely used for medical image segmentation. In most studies, only the output layer is exploited to compute the final segmentation results and the hidden representations of the deep learned features have not been well understood. In this paper, we propose a prototype segmentation (ProtoSeg) method to compute a binary segmentation map based on deep features. We measure the segmentation abilities of the features by computing the Dice between the feature segmentation map and ground-truth, named as the segmentation ability score (SA score for short). The corresponding SA score can quantify the segmentation abilities of deep features in different layers and units to understand the deep neural networks for segmentation. In addition, our method can provide a mean SA score which can give a performance estimation of the output on the test images without ground-truth. Finally, we use the proposed ProtoSeg method to compute the segmentation map directly on input images to further understand the segmentation ability of each input image. Results are presented on segmenting tumors in brain MRI, lesions in skin images, COVID-related abnormality in CT images, prostate segmentation in abdominal MRI, and pancreatic mass segmentation in CT images. Our method can provide new insights for interpreting and explainable AI systems for medical image segmentation. Our code is available on: \url{https://github.com/shengfly/ProtoSeg}.
translated by 谷歌翻译
Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.
translated by 谷歌翻译
In this work, we propose a lightweight integrated LiDAR-Inertial SLAM system with high efficiency and a great loop closure capacity. We found that the current State-of-the-art LiDAR-Inertial SLAM system has poor performance in loop closure. The LiDAR-Inertial SLAM system often fails with the large drifting and suffers from limited efficiency when faced with large-scale circumstances. In this work, firstly, to improve the speed of the whole LiDAR-Inertial SLAM system, we have proposed a new data structure of the sparse voxel-hashing to enhance the efficiency of the LiDAR-Inertial SLAM system. Secondly, to improve the point cloud-based localization performance, we have integrated the loop closure algorithms to improve the localization performance. Extensive experiments on the real-scene large-scale complicated circumstances demonstrate the great effectiveness and robustness of the proposed LiDAR-Inertial SLAM system.
translated by 谷歌翻译